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Abstract: In this paper, we tackle the open problem of snap-stabilization in message- 
passing systems. Snap-stabilization is a nice approach to design protocols that withstand 
transient faults. Compared to the well-known self-stabilizing approach, snap-stabilization 
guarantees that the effect of faults is contained immediately after faults cease to occur. Our 
contribution is twofold: we show that (1) snap-stabilization is impossible for a wide class 
of problems if we consider networks with finite yet unbounded channel capacity; (2) snap- 
stabilization becomes possible in the same setting if we assume bounded-capacity channels. 
We propose three snap-stabilizing protocols working in fully-connected networks. Our work 
opens exciting new research perspectives, as it enables the snap-stabilizing paradigm to be 
implemented in actual networks. 
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Stabilisation instantanee dans les systemes a passage de 

messages 

Resume : Dans cet article, nous considerons le probleme, jusqu'ici ouvert, de la stabili- 
sation instantanee dans les systemes a passage de messages. La stabilisation instantanee 
est une approche elegante permettant de realiser des protocoles qui supportent les fautes 
transitoires. Par rapport a Fapproche auto-stabilisante, la stabilisation instantanement sta- 
bilisante assure que I'efFet des faiites est contenu immcdiatement aprcs que cellcs-ci cessent. 
Notre contribution est double: nous prouvons que (1) la stabilisation instantanee est impos- 
sible pour de nombreux problemes si nous supposons des reseaux oii la capacite des canaux 
de communications est finie mais non bornce; (2) la stabilisation instantanee devient pos- 
sible avec les memes parametres si on suppose que la capacite des canaux est bornee. A 
titre d'exemple, Nous proposons trois protocoles instantanement stabilisants fonctionnant 
dans un rcseau complet. Ces travaux ouvrent de nouvelles perspectives de recherche car ils 
dcmontrcnt que la stabilisation instantanee pent ctre implantcc dans les rcseaiix actuels. 

Mots-cles : Systemes distribues, Algorithme distribue, Auto-stabilisation, Stabilisation 
Instantanee 
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1 Introduction 

Self-stabilization [23] is an elegant approach to forward failure recovery. Regardless of the 
global state to which the failure drives the system, after the influence of the failure stops, a 
self-stabilizing system is guaranteed to resume correct operation. This guarantee comes at 
the expense of temporary safety violation. That is, a self-stabilizing system may behave in- 
correctly as it recovers. Bui et al [11] introduce a related concept of snap-stabilization. Given 
a problem specification, a system is guaranteed to perform according to this specification 
regardless of the initial state. If the system is sensitive to safety violation snap-stabilization 
becomes an attractive option. However, the snap-stabilizing protocols presented thus far 
assume a rather abstract shared memory model. In this model a process reads the states of 
all of its neighbors and updates its own state in a single atomic step. The protocol design 
with forward recovery mechanisms such as self- and snap-stabilization under more concrete 
program model such as asynchronous message-passing is rather challenging. As Gouda and 
Multari [26] demonstrate, if channels can hold an arbitrary number of messages, a large num- 
ber of problems could not be solved by self-stabilizing algorithms: a pathological corrupted 
state with incorrect messages in the channels may prevent the protocol from stabilizing. See 
also Katz and Perry [29] for additional detail on this topic. The issue is exacerbated for 
snap-stabilization by the stricter safety requirements. Thus, however attractive the concept, 
the applicability of snap-stabilization to concrete models, such as message-passing models 
remained. In this paper we address this problem. We outline the bounds of the achiev- 
able and present snap-stabilizing solutions in message-passing systems for several practical 
problems. 

Related literature. Several studies modify the concept of self-stabilization to add safety 
property during recovery from faults. Dolev and Herman [24] introduce super- stabilization 
where a self-stabilizing protocol can recover from a local fault while satisfying a safety 
predicate. This theme is further developed as fault- containment [25]. 

A number of snap-stabilizing protocols are presented in the literature. In particular 
propagation of information with feedback{PlF) is a popular problem to address [11, 10, 12, 
20, 14, 9, 19]. Several studies present snap-stabilizing token circulation protocols [30, 16, 18]. 
There also exists snap-stabilizing protocols for neighborhood synchronization [28], binary 
search tree construction [8] and cut-set detection [17]. Cournier et al [15] propose a method 
to add snap-stabilization to a large class of protocols. 

Unlike snap-stabilization, self-stabilizing protocol were designed for message-passing sys- 
tems of unbounded capacity channels. Afek and Brown [2] use a string of random sequence 
numbers to counteract the problem of infinite-capacity channels and design a self-stabilizing 
alternating-bit protocol (ABP). Delaet et al [22] propose a method to design self-stabilizing 
protocols for a class of terminating problems in message-passing systems with lossy channels 
of unbounded capacity. Awerbuch et al [6] describe the property of local correctability and 
demonstrate who to design locally-correctable self-stabilizing protocols. Researchers also 
consider message-passing systems with bounded capacity channels [1, 33, 27, 5, 7]. 
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Our contribution. In this paper, we address the problem of snap- stabilization in message- 
passing systems. We introduce the concept of safety- distributed problem specification that 
encompasses most practical problems and show that it is impossible to satisfy by a snap- 
stabilizing protocol in message-passing systems with unbounded finite channel capacity. 
That is if the channel capacity bound is unknown to the processes. As a constructive 
contribution, we show that snap-stabilization becomes possible if bound for the channel 
capacity is known. Wo present the snap-stabilizing protocols that solve the PIF, the ID- 
learning and the mutual exclusion problems. To the best of our knowledge these are the 
first snap-stabilizing protocols in such a concrete program model. 

Paper outline. The rest of the paper is organized as follows. We define the message- 
passing program model in Section 2. In the same section, we describe the notion of snap- 
stabilization and problem specifications. In Section 3, we prove the impossibility of snap- 
stabilization in message-passing systems with channels of infinite capacity. We present the 
snap-stabilizing algorithms for the system with bounded capacity channels in Section 4. We 
conclude the paper in Section 5. 

2 The Model 

We consider distributed systems having a finite number of processes and a fully- connected 
topology: any two distinct processes can communicate together by sending messages through 
a bidirectionnal link [i.e., two channels in the opposite direction). 

A process is a sequential deterministic machine that uses a local memory, a local al- 
gorithm, and input/output capabilities. Intuitively, such a process executes a local algo- 
rithm. This algorithm modifies the state of the process memory, and sends/receives messages 
through channels. 

We assume that the channels incident to a process are locally distinguished by a channel 
number. For sake of simplicity, we assume that every process numbers its channels from 1 
to n — 1 (n being the number of processes). In the following, we will indifferently use the 
notation q to designate the process q or the local channel number of q in the code of some 
process p. We assume that the channels are FIFO but not necessary reliable (messages can be 
lost). However they all satisfy the following property: if an origin process o sends infinitely 
many messages to a destination process d, then infinitely many messages are eventually 
received by d from a. Also, we assume that any message that is never lost is received in a 
finite (but unbounded) time. 

The messages are of the following form: (message-type,message-value). The mes- 
sage-value field is omitted if the message does not carry any value. The messages can 
contain more than one message-value. 

An protocol consists of a collection of actions. An action is of the following form: 
(label) :: (guard) (statement). A gwarrf is a boolean expression over the variables of a 
process and/or an input message. A statement is a sequence of assignments and/or message 
sendings. An action can be executed only if its guard is true. We assume that the actions 
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are atomically executed, meaning that the evaluation of the guard and the execution of the 
corresponding statement of an action, if executed, are done in one atomic step. An action is 
said enabled when its guard is true. When several actions are simultaneously enabled at a 
process p, all these actions are sequentially executed following the order of their appearance 
in text of the protocol. 

We reduce the state of each process to the state of its local memory, and the state of 
each link to its content. Hence, the global state of the system, referred to as configuration, 
can be simply defined as the product of the states of the memories of processes and of the 
contents of the links. 

A distributed system can be described using a transition system [32] . A transition system 
is a 3-uple S = {C, >-^,X) such that: C is set of configurations, is a binary transition relation 
on C, and X C C is the set of initial configurations. Using the notion of transition system, we 
can modelize the executions of a distributed system as follows: an execution oi S = (C,i-^,T) 
is a maximal sequence of configurations 70, . . . , 7i-i, 7i, • ■ ■ such that: 70 € I and > 0, 
7j_i 1-^ 7j (7i_i I— > 7j is referred to as a step). In this paper, we only consider systems 
<S = (C,i— >,X) such that 1 = C. 

Snap-Stabilization. In the following, a specification is a predicate defined on the execu- 
tions. 

Definition 1 (Snap-Stabilization [11]) Let SVt be a specification. An protocol V is 
snap-stabilizing for SVt if and only if starting from any configuration, any execution of V 
satisfies SVt- 

It is important to note that a snap-stabilizing protocol does not guarantee that the system 
never works in a fuzzy manner. Actually, the main idea behind the snap-stabilization is 
the following: the protocol is seen as a function and the function ensures two properties 
despites the arbitrary initial configuration of the system: (1) Upon an external [lu.r.t. the 
protocol) request at a process p, the process p (called the initiator) starts a computation of the 
function in finite time using special actions called starting actions. (2) If the process p starts 
an computation, then the computation performs an expected task. With such properties, the 
protocol always satisfies its specifications. Indeed, when the protocol receives a request, 
this means that an external application (or a user) requests the computation of a specific 
task provided by the protocol. In this case, a snap-stabilizing protocol guarantees that the 
requested task is executed as expected. On the contrary, when there is no request, there is 
nothing to guarantee^. 

^This latter point is the basis of many misunderstandings about snap-stabilization. Indeed, due to the 
arbitrary initial configuration, some computations may initially run in the system without having been 
started: of course, snap-stabilization does not provide any guarantee on these non-requested computations. 
Consider, for instance, the problem of mutual exclusion. Starting from any configuration, a snap-stabilizing 
protocol cannot prevent several (non-requesting) processes to execute the critical section simultaneously. 
However, it guarantees that every requesting process executes the critical section in an exclusive manner. 
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Specifications. Due to the Start and Correctness properties it has to ensure, snap-sta- 
bilization requires specifications based on a sequence of actions (request, start, . . . ) rather 
than a particular subset of configurations {e.g., the legitimate configurations). Hence, for 
any task T, we consider specifications of the following form: 

- When requested, an initiator starts a computation of T in a finite time. (Start) 

- Any computation of T that is started is correctly performed. (Correctness) 

In this paper, the two first protocols we present are of a particular class: the wave protocols 
[32]. The particularity of such protocols is that they compute tasks that are finite and 
each of their computations contains at least one decision event that causally depends on an 
action at each process. Hence, our specifications for wave protocols contain two additionnal 
requirements: 

- Each computation (even non-started) terminates in finite time. (Termination) 

- When the protocol terminates, if a computation was started, then at least one de- 
cision occurred and such a decision causally depends on an action at every process. 
(Decision) 

Self- vs. Snap-Stabilization. Snap-stabilizing protocols arc often compared to the 
self-stabilizing protocols — such protocols converge in a finite time to a specified behavior 
starting from any initial configuration ([23]). The main advantage of the snap-stabilizing 
approach compared to the self-stabilizing one is the following: while a snap-stabilizing pro- 
tocol ensures that any request is satisfied despite the arbitrary initial configuration, a self- 
stabilizing protocol often needs to be repeated an unbounded number of times before guar- 
antying the proper processing of any request. 

3 Impossibility of Snap-Stabilization in Message-Pas- 
sing with Unbounded Capacity Channels 

In [3], Alpcrn and Schneider observe that a specification is an intersection of safety and 
liveness properties. In [4] , the same authors define a safety property as a set of "bad things" 
that must never happen. Hence, it is sufficient to show that a prefix of an execution contains 
a "bad thing" to prove that the execution (and so the protocol) violates the safety property. 
We now consider safety-distributed specifications, i.e., specifications having some safety- 
distributed properties. Roughly speaking, a safety-distributed property is a safety property 
that does not only depend on the behavior of a single process: some local behaviors at 
some processes are forbidden to be executed simultaneously while they are possible and do 
not violate the safety-distributed property if they are executed alone. For example, in the 
mutual exclusion problem, a requesting process eventually executes the critical section but 
no two requesting processes must execute the critical section concurrently. 

We now introduce the notions of abstract configuration, state-projection, and sequence- 
projection. These three notions are useful to formalize safety- distributed specifications. 
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Definition 2 (Abstract Configuration) We call abstract configuration any configura- 
tion restricted to the state of the processes (I.e., a configuration where the state of each link 
has been removed). 

Definition 3 (State-Projection) Let 7 be configuration and p be a process. The state- 
projection of J on p, noted 4'p{^), is the local state ofp in 7. Similary, the state-projection 
of 7 on all processes, (j){-f) is the product of the local states of all processes in 7 (n.h. (!){')) 

is an abstract configuration). 

Definition 4 (Sequence-Projection) Let s = 70,71, . . .be a configuration sequence and 
p be a process. The sequence-projection of s on p, noted $p(s), is the state sequence 
0p(7o),(/>p(7i), ... Similary, f/ie sequence-projection of s on all processes, noted ^{s), is 
the abstract configuration sequence (j^ijo) ,<t^{li) , ■ ■ ■ 

Definition 5 (Safety-Distributed) A specification SV is safety-distributed if there exists 

a sequence of abstract configurations BAD, called bad-factor, such thai: 

(1) For each execution e, if there exist three configuration sequences cq, ei, and 62 such 
that e = 606162 and $(ei) = BAD, then e does not satisfy SV. 

(2) For each process p, there exists at least one execution Cp satisfying SV luhere there 
exist three configuration sequences e°, 6^, and e^ such that Cp = CpCpCp and ^p{ep) = 
$p(BAD). 

Almost all classical problems of distributed computing have safety-distributed specifications, 
e.g., mutual exclusion, phase synchronization, . . . For example, in mutual exclusion a bad- 
factor is any sequence of abstract configurations where several requesting processes executes 
the critical section concurrently. We now consider a message-passing system with unbounded 
capacity channels and show the impossibility of snap- stabilization for safety-distributed spec- 
ifications in that case. 

Theorem 1 There exists no safety-distributed specification that admits a snap- stabilizing 
solution in message-passing systems with unbounded capacity channels. 

Proof. Let SV be a safety-distributed specification and BAD = ao,ai,. . .be a bad-factor 
of SV. 

Assume, for the purpose of contradiction, that there exists a protocol V that is snap- 
stabilizing for SV. By Definition 5, for each process p, there exists an execution Cp of V that 
can be split into three execution factors 6°, = /3o,/3i,. • • , and such that Cp = e'^CpC^ 
and $p(6p) = $p(BAD). Let us denote by MesSeq^ the ordered sequence of messages that p 
receives from any process g' in 6^. Consider now the configuration 70 such that: 

(1) 4>{lo) = "0- 

(2) For each two processes p, q such that p^ q, the link {p,q} as the following state in 70: 

(a) The messages in the channel from q to p are exactly the sequence MesSeq^ 
(keeping the same order). 



RR n° 9999 



8 



Sylvie Delaet , Stephane Devismes , Mikhail Nesterenko , Sebastien Tixeuil 



(b) The messages in the channel from p to q are exactly the sequence MesSeq^ 
(keeping the same order). 

(It is important to note that wc have the guarantee that 70 exists because we assume un- 
bounded capacity channels. Assuming channels with a bounded capacity c, no configuration 
satisfies Point (2) if there are at least two distinct processes p and g such that \MesSeq^\ > c.) 

As V is snap-stabilizing, 70 is a possible initial configuration of "P. To obtain the con- 
tradiction, we now show that there is an execution starting from 70 that does not satisfy 
SV. By definition, ^(70) = cto- Consider a process p and the two first configurations of e^: 
Po and /3i. Any message that p receives in (3q /?i can be received by p in the first step 
from 7o: 70 ^-^ 7i- Now, 0p(7o) = 4'piPo)- So, p can behave in 70 i--> 7^ as in /3o '-^ Pi- 
In that case, (f>p{'yi) = (f)p{0i). Hence, if every process p behaves in 70 1-^ 71 as in the 
first step of its execution factor e^, we obtain a configuration 71 such that 0(71) = cei- By 
induction principle, there exists an execution prefix starting from 70 noted PRED such that 
^{PRED) = BAD. As V is snap-stabilizing, there exists an execution SUFF that starts 
from the last configuration of PRED. Now, merging PRED and SU FF we obtain an exe- 
cution of V that does not satisfy SV — this contradicts the fact that V is snap-stabilizing. 
□ 

Intuitively, the impossibility result of Theorem 1 is due to the fact that in a system with 
unbounded capacity channels, any initial configuration can contain an unbounded number 
of messages. If we consider now systems with bounded and known channel capacity, we can 
circumvent the impossibility result by designing protocols that require a number of messages 
that is greater than the bound on the channel capacity to perform their specified task. This 
is our approach in the next section. 

4 Snap-Stabilizing Message-Passing Protocols 

We now consider systems with channels having a bounded capacity. In such systems, we 
assume that if a process sends a message in a channel that is full, then the message is lost. 
We restrict our study to systems with single-message capacity channels. The extention to an 
arbitrary but known bounded message capacity is straightforward (see [6, 7]). We propose 
three snap-stabilizing protocols (Algorithms 1-3) for the Propagation of Information with 
Feedback (PIF), IDs-Learning, and mutual exclusion problem, respectively. The PIP is 
a basic tool allowing us to solve the two other problems. The IDs-Learning is a simple 
application of the PIF. Finally, the mutual exclusion protocol uses the two former protocols. 

4.1 A PIF Protocol 

The concept of Propagation of Information with Feedback (PIF), also called Wave Propaga- 
tion, has been introduced by Chang [13] and Segall [31]. PIF has been extensively studied 
in the distributed literature because many fundamental protocols, e.g., Reset, Snapshot, 
Leader Election, and Termination Detection, can be solved using a PIF-based solution. The 
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PIF scheme can be informally described as follows: when requested, a process starts the 
first phase of the PIF-computation by broadcasting a specific message m into the network 
(this phase is called the broadcast phase). Then, every non-initiator acknowledges^ to the 
initiator the receipt of m (this phase is called the feedback phase). The PIF-computation ter- 
minates when the initiator received acknowledgments from every other process and decides 
taking these acknowledgments into account. In distributed systems, any process may need 
to initiate a PIF-computation. Thus, any process can be the initiator of a PIF-computation 
and several PIF-computations may run concurrently. Hence, any PIF protocol has to cope 
with concurrent PIF-computations. 

Specification 1 (PIF- Execution) An execution e satisfies PIF-executioni'e^ if and only 
if e satisfies the following four properties: 

- Start. When there is a request for a process p to broadcast a message m, p starts a 
PIF-computation in finite time. 

- Correctness. During any PIF-computation started by p for the message m; 

- Any process different of p receives m. 

- p receives acknowledgments for m from every other process. 

- Termination. Any PIF-computation (even non-started) terminates in finite time. 

- Decision. When a PIF-computation started by p terminates at p, p decides taking all 
acknowledgments of the last message it broadcasts into account only. 

Approach. In the following, we refer to our snap-stabilizing PIF as Protocol V2!F. We 
describe our approach using a network of two processes: p and q. The generalization to 
a fully-connected network of more than two processes is straightforward and presented in 
Algorithm 1. 

Consider the following example. Each process maintains in the variable Old its own age 

and p wants to know the age of q. Then, p performs a PIF of the message "How old are 
you?". To that goal, we need the following input/output variables: 

- Requestp. This variable is used to manage the PIF'requests for p. Request^ is (ex- 
ternally) set to Wait when there is a request for p to perform a PIF. Requestp is 
switched from Wait to In at the start of each PIF-computation {n.b. p starts a PIF- 
computation upon a request only). Finally, Request^ is switched from In to Done at 
the termination of each PIF-computation (this latter switch also corresponds to the 
decision event) . Since a PIF-computation is started by p, we assume that p does not 
set Requestp to Wait until the termination of the current PIF-computation, i.e., until 
Requestp = Done. 

- B-Mesp. This variable contains the message to broadcast. 

- F-Mesg. When q receives the broadcast message, q assigns the acknowledgment message 
in F-MeSg. 

^An acknowledgment is a message sent by the receiving process to inform the sender about data it have 
correctly received. 
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Using these variables, we perform a PIF of "How old are you?" as follows: PZjF.B-MeSp and 
■pI^.Requestp are respectively (externally) set to "How old are you?" and Wait meaning 
that we request that p broadcasts "How old are you?" to q. Consequently to this request, 
Protocol VIJ^ starts a PIF-computation by setting ■pXjT.Requestp to In and this computa- 
tion terminates when "PX^.Requestj, is set to Done. Between this start and this termination, 
VTT generates two events. First, a "receive-brd (iJow old are you!) from p" event at q. 
When this event occurs, q sets V2J- .V-Vlesq to Old^ so that VXJ^ feedbacks the value of Oldq 
to p. Protocol VIT then transmits the value of Oldq to p: this generates a "receive-fck(a;) 
from g'" event at p where x is the value of Oldq. 

A naive attempt to implement Protocol VIT could be the following: 

- When VIJ- -Keqaestp = Wait, p sends a broadcast message containing the data 
message VTT .h-VieSp to q and sets PI.F.Requestp to In (meaning that the PIF- 
computation is in processing). 

- Upon receiving a broadcast message containing the data S, a "receive-brd(i?) from 
p" event is generated at q so that the application (at q) that uses the PIF treats the 
message B. Upon this event, the application is assumed to set the feedback message 
into ■PX.F.F-Mesg. Then, q sends a feedback message containing 'PX.F.F-MeSg to p. 

- Upon receiving a feedback message containing the data F, a "receive- fck(F) from 
g" event is generated at p so that the application (at p) that uses the PIF treats the 
feedback and then sets PX^.Requestp to Done. 

Unfortunately, such a simple approach is not snap-stabilizing in our system: 

(1) Due to the unreliability of the channels, the system may suffer of deadlock. If the 
broadcast message from p or feedback message from q are lost. Protocol VIJ^ never 
terminates at p. 

(2) Due to the arbitrary initial configuration, the link {p,q] may initially already contain 
an arbitrary message in the channel from p to q and another in the channel from q to 
p. Hence, after sending the broadcast message to q, p may receive a feedback message 
that was not sent by q. Also, q may receive a broadcast message that was not sent by 
p: as a consequence, q generates an undesirable feedback message. 

To circumvent these two problems, we use two additionnal variables at each process: 

- StatGp e {0,1,2,3,4} (resp. State,) is a flag value that p (resp. q) puts into its 

messages. 

- NeigState^ (resp. NeigState^) is equal to the last State, (resp. Statep) that p 
(resp. q) receives from q (resp. p). 

(Note that we use a single message type, noted PIF, to manage the PIF-computations 
initiated by both p and q.) 

Our protocol works as follows: p starts a PIF-computation by setting State^ to 0. 
Then, until Statep = 4, p repeatedly sends (PIF,B-MeSp,F-MeSp, Statep, MeigState^) to q. 
When q receives {B,F,pState,qState) (from p), q updates NeigState to pState and then 
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sends a message (PIF,B-MeSg,F-MeSq,Stateq,NeigState^) to p if pState < 4 {i.e., if p is 
still waiting for a message from q). Finally, p increments Statep only when it receives 
a {PIF, B,B,qState,pState) message from q such that Statep = pState and pState < 4. 
Hence, after p starts, Statep = 4 only after p successively receives {FIF ,B ,F ,qState, pState) 
messages (from q) with pState = 0,1,2,3. Now, considering the arbitrary initial value of 
NeigState^ and the at most two arbitrary messages initially in the link {p.q} (one in the 
channel from p to q and one in the channel from q to p), wc arc sure that after p starts, p 
receives a {PlF ,B ,F ,qState, pState) from q with pState = Statep = 3 only if this message 
was sent by q consequently to the reception by q of a message sent by p. 

Figure 1 illustrates the worst case of Protocol VIJ- in terms of configurations. In this 
example, p may increment Statep after receiving the initial message with the flag value 
pState = 0. Then, if q starts a PIF-computation, q sends messages with the flag value 
pState = 1 until receiving (from p) the initial message with the value pState = 2. Hence, 
p can still increment Statep twice due to the values 1 and 2 {i.e., State^ then reaches the 
value 3). But, after these incrementations, p no more increments Statep until receiving a 
message with the value pState = 3 and q starts sending messages with the value pState = 3 
only after receiving a message from p with the value pState = 3. Finally, note that after 
receiving a message with the value pState = 3, p increments Statep to 4 and stops sending 
messages imtil the next request. This ensures that if the requests eventually stop, the system 
eventually contains no message. 

® {PIF, Bp,Fp,pState = 2,qState) 
" 
{PTF ,Bq,Fq,qState,pState = 0) 

Statep = NeigStateg = 

Figure 1: Worst case of Protocol VIJ^ in terms of configurations. 



It remains to see when a process can generate the receive-brd and receive-fck events: 

- q receives at least 4 copies of the broadcast messages. But, q generate a receive-brd 
event only once for each broadcast message: when q switches NeigState^ to 3. 

- After it starts, p is sure to receive the "good" feedback only when it receives a message 
with pState = Statep = 3. As previously, to limit the number of events, p generates 
a receive-fck events only when it switches Statep from 3 to 4. The other copies 
are then ignored. Also, note that after receiving this message, p can only receives 
duplicates until the next PIF-computation. Hence, when p decides, it decides only 
taking the "good" feedbacks into account. 

We generalize this snap-stabilizing one-to-one broadcast with feedback to a snap-stabilizing 
all- to-all broadcast with feedback {i.e., a PIF) in Algorithm 1. It is important to note that 
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Algorithm 1 Protocol VIJ^ for any process p 



Constant: n: integer, number of processes 
Variables: 

Request^ G {Wait, In, Done} 

B-MeSp 

F-Mesp[l ... n - 1] 

Statep[l ... n - 1] e {0,1,2,3,4}""^ 
NelgStatep[l . . . n - 1] e {0,1,2,3,4}"-^ 

Actions: 
Ai :: (Requestp = Wait) 



input/output variable 

data to broadcast, input variable 

array of messages to feedback, input variable 

internal variable 

internal variable 

— ^ Request^ ^ — In /* Start 
for all q ^ [1 ... n — 1] do 

StatSpfg] <— 
done 



A2 :: (Request^ = In) — > if (Vg £ [1 ... n - 1], Statep[g] = 4) then 

Requestp Done / * Termination * / 

else 

for all q ^ [1 ... n — 1] do 
if (Statep[g] 4) then 

send(PIF,B-MeSp,F-MeSp[q],Statep[g],NeigStatep[g])to q 
end if 
done 
end if 



A3 :: receive(PIF,S,F,gState,pState>from g — > if (NeigStatep [g] 7^ 3) A (qState = 3) then 

generate a "receive-brd(S) from g" event 
end if 

NeigStatep[q] *— qState 

if (Statep[(j] = pState) A (Statep[q] < 4) then 

Statep [cy] < — Statep[q] + 1 

if (Statep [g] = 4) then 

generate a "receive-fck(i^) from g" event 

end if 
end if 

if {qState < 4) then 

send {PIF,B-Mesp ,F-MeSp [g] ,Statep [g] ,NeigStatep [g] >to g 
end if 



our protocol does not prevent processes to generate unexpected receive-brd or receive- fck 

events. Actually, what our protocol ensures is: when a process p starts to broadcast a 
message m, then (1) every other process eventually receives m (receive-brd), (2) p even- 
tually receives a feedback for m from any other process (receive- fck), and (3) p decides 
("PX^.Requestp ^ Done) by only taking the "good" feedbacks into account. Another in- 
teresting property of our protocol is the following: after the first complete computation of 
VIJ^ (from the start to the termination), the channels from and to p contain no message 
from the initial configuration. 

Proof of Snap-Stabilization. The proof of snap-stabilization of VTJ^ just consists in 

showing that, despite the arbitrary initial configuration, any execution of VTJ- always sat- 
isfies the four properties of Specification 1. In the following proofs, the message-values will 
be replaced by "— " when they have no impact on the reasonning. 
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Lemma 1 (Start) Starting from any configuration, when there is a request for a process p 
to broadcast a message, p starts a PIF-computation in finite time. 

Proof. Wc assumed that Request^ is externally set to Wait when there is a request for 
the process p to broadcast a message. Moreover, we claim that a process p starts Protocol 
VIJ^ by switching Requestp from Wait to In. Now, when Requestp = Wait, Action Ai is 
continuously enabled at p and by executing Ai, p sets Request^ to In. Hence, the lemma 
holds. □ 

The following Lemmas (Lemmas 2-6) hold assuming that no PIF-computation (even non- 
started) can be interrupted due to another request: 

Hypothesis 1 While Request^ ^ Done, Request^ is not (externally) set to Wait. 

Lemma 2 Consider two distinct processes p and q. Starting from any configuration, if 
(Requestp — In) A (Statep[(7] < 4), then Statep[g] is eventually incremented. 

Proof. Assume, for the purpose of contradiction, that Request^ = In and Statep[g] = i 
with i < 4 but Statep[g] is never incremented. Then, from Algorithm 1, Requestp = In 
and Statepfg] = i hold forever and by Actions A2 and A3, we know that: 

- p only sends to q messages of the form (PIF,— ,— ). 

- p sends such messages infinitely many times. 

As a consequence, q eventually only receives from p messages of the form (PIF,— ) and 
q receives such messages infinitely often. By Action A3, NeigState^[p] = i eventually holds 
forever. From that point, any message that q sends to p is of the form (PIF,— ,— ,— ,i). Also, 
as i < 4 and q receives infinitely many messages from p, q sends infinitely many messages 
of the form (PIF, — ,— ,— ,i) to p (see Action A3). Hence, p eventually receives (PIF,— ,— ,— ,i) 
from q and, as a consequence, increments Statep[g] (see Action A3) — a contradiction. □ 

Lemma 3 (Termination) Starting from any configuration, any PIF-computation (even 
non-started) terminates in finite time. 

Proof. Assume, for the purpose of contradiction, that a PIF-computation never termi- 
nates at some process p, i.e., Request^ ^ Done forever. Then, Request^ = In eventually 
holds forever by Lemma 1. Now, by Lemma 2 and owing the fact that Vg G [1 . . .n — 1], 
Statep[(j'] cannot decrease while the computation is not terminated at p, we can deduce that 
p eventually satisfies "Vg e [1 ... n — l],Statep[g'] = 4" forever. In this case, p sets Requestp 
to Done by Action A2 — a contradiction. □ 

Lemma 4 Let p and q be two distinct processes. After p starts to broadcast a message 
from an arbitrary configuration, p switches Statep[g] from 2 to 3 only if the three following 
conditions hold: 
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(1) Any message in the channel from p to q are of the form (PIF,— ,— ) with i ^ 3. 

(2) NeigStateJp] 3. 

(3) Any message in the channel from q to p are of the form (PIF, — ,— j) with j ^ 3. 

Proof. p starts to broadcast a message by exeeuting Action Ai {n.b. Ai is the only 
starting action of VIJ^). When p executes Ai, p sets (in particular) Statep[(;] to 0. From 
that point, Statep[(j'] can only be incremented one by one until reaching value 4. Let us 
study the three first incrementations of Statep[g]: 

- Prom to 1. Statep[(7] switches from to 1 only after p receives (PIF, — , — , — ,0) from 
q (Action A3). As the link {p,q] always contains at most one message in the channel 
from q to p, the next message that p will receive from q will be a message sent by q. 

- Prom 1 to 2. From the previous case, we know that Statep[(7] switches from 1 to 
2 only when p receives (PIF,— ,— ,— ,1) from q and this message was sent by q. From 
Actions A2 and A3, we can then deduce that NeigState^[p] = 1 held when q sent 
(PIF,— ,— ,— ,1} to p. From that point, NeigState^[p] = 1 holds until q receives from p 
a message of the form (PIF,— ,— ) with i ^ 1. 

- Prom 2 to 3. The switching of Statep[(7] from 2 to 3 can occurs only after p re- 
ceives a message mesi = (PIF, — , — , — ,2) from q. Now, from the previous case, we 
can deduce that p receives mesi consequently to the reception by g of a message 
meso = (PIF,— ,—,2,—) from p. Now: 

(a) As the link {p.q} always contains at most one message in the channel from p to 
q, after receiving meso and until Statep[g] switches from 2 to 3, every message 
in transit from p to q is of the form (PIF,— ,— ,i,— ) with i ^ 3 (Condition (1) of 
the lemma) because after p starts to broadcast a message, p sends messages of 
the form (PIF,— ,—,3,—) to q only when Statep[q'] = 3. 

(b) After receiving meso, NeigState^fp] ^ 3 until q receives (PIF,— ,—,3,—). Hence, 
by (a), after receiving meso and until (at least) Statej,[g] switches from 2 to 3, 
NeigState^lp] 7^ 3 (Condition (2) of the lemma). 

(c) After receiving mesi, Statep[(j'] ^ 3 until p receives (PIF,— ,—,—,3) from q. As 
p receives mesi after q receives meso, by (b) we can deduce that after receiving 
mesi and until (at least) Statep[g] switches from 2 to 3, every message in transit 
from g to p is of the form (PIF,— ,— ,— with j 7^ 3 (Condition (3) of the lemma). 

Hence, when p switches Statep[g] from 2 to 3, the three conditions (1), (2), and (3) 
are satisfied, which proves the lemma. 

□ 

Lemma 5 (Correctness) Starting from any configuration, if p starts to broadcast a mes- 
sage m, then: 

- Any process different of p receives m. 

- p receives acknowledgments for m from every other process. 
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Proof. p starts to broadcast rn by executing Action Ai: p switches Request^ from Wait 
to In and sets Statep[(7] to 0, e [1 . . . 0]. Then, Requestp remains equal to In until 
p decides by Request^ <— Done. Now, p decides in finite time by Lemma 3 and when p 
decides, we have Statep[g] = 4, Vg € [1 ... 0] (Action A2). From the code of Algorithm 1, 
this means that G [1 ... 0], Statep[g] is incremented one by one from to 4. By Lemma 
4, Vg e [1 ... 0], Statep[g] is incremented from 3 to 4 only after: 

- q receives a message sent by p of the form (PIF,m,— ,3,— ), and then 

- p receives a message sent by q of the form (PIF,— ,— ,3,— ). 

When q receives the first (PIF,m,— ,3,— ) message from p, q generates a "receive-brd(m) 
from p" event and then starts to send (PIF, — — ,3) messages to p^. From that point and 
until p decides, q only receives (PIF,m,— ,3,— ) message from p. So, from that point and until 
p decides, any message that q sends to p acknowledges the reception of m. Since, p receives 
the first (PIF,— ,3) message from q, p generates a "receive-fck(F) from q" event and 
then sets Statep[g] to 4. 

Hence, e [1 . . .0], the broadcast of m generates a "receive- brd(m) from p" event 
at process q and then an associated "receive-fck(F) from q" event at p, which proves the 
lemma. □ 



Lemma 6 (Decision) Starting from any configuration, when a PIF-computation started 
by p terminates at p, p decides taking all acknowledgments of the last message it broadcasts 

into account only. 

Proof. First, p starts to broadcast a message m by executing Action Ai: p switches 
Requestp from Wait to In and sets Statep[g] to 0, Vg € [1 . . .0]. Then, Request^ remains 
equal to In until p decides by Request^ <— Done. Now, (1) p decides in finite time by 
Lemma 3, (2) when p decides, we have Statej,[g] = 4, Vg e [1 . . . 0] (Action A2), and (3) 
after p decides, each time q receives a message from p with the data m, the message is 
ignored (this is a consequence of Claim (2)). From the code of Algorithm 1, we know that 
exactly one "receive-fck(F) from q" event per neighbor q occurs at p before p decides: 
when p switches Statep[g] from 3 to 4. Now, Lemma 5 and Claim (3) imply that each 
of these feedbacks corresponds to an acknowledgment for m. Hence, p decides taking all 
acknowledgments of m into account only and the lemma is proven. □ 

By Lemmas 1, 3, 5, and 6, starting from any arbitrary initial configuration, any execution 
of VI always satisfies Specification 1. Hence, follows: 

Theorem 2 Protocol VTT is snap-stabilizing for Specification 1. 

Below, we give an additionnal property of VTT, this property will be used in the snap- 
stabilization proof of Protocol M.£. 

sends a {PIF,— ,F, — ,3) message to p (at least) each time it receives a (PIF,m,— ,3,— ) message from p. 
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Property 1 If p starts a PIF-computation (using Protocol VI!F) in the configuration 70 
and the computation terminates at p in the configuration 7fe, then any message that was in 
a channel from and to p in 70 is no longer in the channel in jk ■ 

Proof. Assume that a process p starts a PIF-computation (using Protocol VIJ-) in 
the configuration 70. Then, as VIJ^ is snap-stabihzing for Specification 1, we have the 
guarantee that for every ^'neighbor at least one broadcast message crosses the channel 
from ptoq and at least one acknowledgment message crosses the channel from qtop during 
the PIF-computation. Now, we assumed that each channel has a single-message capacity. 
Hence, every message that was in a channel from and to p in the configuration 70 has been 
received or lost when the PIF-computation terminates at p in configuration 7^ □ 

4.2 A IDs-Learning Protocol 

Protocol IVC (its implementation is presented in Algorithm 2) is a simple application of 
Protocol VIJ^. This protocol assumes IDs on processes {IDp denotes the identity of the 
process p) and uses three variables at each process p: 

- Requestp G {Wait,In,Done}. The goal of this variable is the same as in VIJ^. 

- minlDp. After a complete execution of XI'£ (i.e., from the start to the termination), 
minlDp contains the minimal ID of the system. 

- ID-Tabp[l . . .n\. After a complete execution of IVC, ID-Tabp[g] contains the ID of the 
^'neighbor q. 

When requested (JD£.Requestp = Wait) at p. Protocol IVC evaluates the ID of each of 
its neighbors q and the minimal ID of the system using Protocol VIT. The results of the 
computation are available for p since p decides (when ID£.Requestp <— Done). Based on 
the specification of VIT, it is easy to see that IVC is snap-stabilizing for the following 
specification: 

Specification 2 (IDs-Learning-Execution) An execution e satisfies IDs-Learning-exe- 

cution(^ej if and only if e satisfies the following four properties: 

- Start. When requested, a process p starts a IDs-Learning-computation in finite time. 

- Correctness. At the end of any IDs-Learning-computation started by p: 

- e [1 . . . n - 1], ID-Tabp[g] = IDq. 

- minlDp = min({/D,, g e [1 . . . n - 1]} U {IDp}). 

- Termination, vlny IDs-Learning-computation (even non- started) terminates in finite 
time. 

- Decision. If p is in a terminal state and a IDs-Lcarning-computation was started by 
p, then p decided knowing the minimal ID of the system and the ID of every of its 
neighbors. 

Theorem 3 Protocol IVC is snap- stabilizing for Specification 2. 
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Algorithm 2 Protocol 1T>C for any process p 



Constant: 

n : integer, number of processes 

IDp integer, identity of p 

Variables: 



Requestp G {Wait, In, Done} 

minlDp 

ID-Tabp[l . . .n - 1] G IN'^"^ 
Actions: 
Ai :: (Request = Wait) 



input/output variable 
integer, output variable 
output variable 



A2 " (Request^ — In) A (PXjF.Request^ — Done) 
A3 receive-brd(lDL) from q 

A4 receive-fck(g/Z)) from q 



Request^ ^ In /* Start */ 
minlDp * — IDp 
PXJT.B-Mesp ^ IDL 
PX^-Requestp -i— Wait 

Request^ ^ Done /* Termination */ 

VXJ^.F-}\eSp[q] IDp 

ID-Tabp [q] qlD 

minlDp *— min(ininIDp,g'/£)) 



4.3 A Mutual Exclusion Protocol 

We now consider the problem of mutual exclusion. Mutual exclusion is a well-known mecha- 
nism allowing to allocate a common resource. Indeed, a mutual-exclusion mechanism ensures 
that a special section of code, called critical section (noted (CS) in the following), can be 
executed by at most one process at any time. The processes can use their critical section to 
access to a shared ressource. Generally, this resource corresponds to a set of shared variables 
in a common store or a shared hardware device {e.g., a printer). The first snap-stabilizing 
implementation of mutual exclusion is presented in [21] but in the state model (a stronger 
model than the message-passing model). In [21], authors adopt the following specification*: 

Specification 3 (ME- Execution) An execution e satisfies ME-execution (e j if and only 
if e satisfies the following two properties: 

- Start. Any process that requests the (CS) enters in the (CS) in finite time. 

- Correctness. If a requesting process enters in the (CS), then it executes the (CS) 
alone. 

Approach. We now propose a snap-stabilizing mutual exclusion protocol called Protocol 
A4£. The implementation of M£ is presented in Algorithm 3. As for the previous solu- 
tions. Protocol A4£ uses the input/output variable Request. A process p (externally) sets 
A^f .Request^ to Wait when it requests the access to the (CS). Process p is then called a 
requestor and assumed to not execute A^f .Request^ <— Wait until Atf .Request^ = Done, 
i.e., until its current request is done. 

''This specification was firstly introduced and justified in [15]. 
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The main idea of the protocol is the following: we assume IDs on processes and the 
process with the smallest ID — called the leader — decides using a variable called Value 
which process can executes the (CS). When a process learns that it is authorized to access 
the (CS): 

(1) It first ensures that no other process can execute the (CS). 

(2) It then executes the (CS) if it wishes. 

(3) Finally, it notifies to the leader that it releases the (CS) so that the leader (fairly) 
authorizes another process to access the (CS). 

To apply this scheme, J^£ executes by phases from Phase to 4 in such way that each 
process goes through Phase infinitely often. For each process p, Phasep denotes in which 
phase process p is. After requesting the (CS) (A^f .Requestp <— Wait), a process p can 
access the (CS) only after executing Phase 0. Indeed, p can access to the (CS) only if 
Aif.Requestp = In and p switches Aif.Requestp from Wait to In only when executing 
Phase 0. Hence, our protocol has just to ensure that after executing its phase 0, a process 
always executes the (CS) alone. Our protocol offers such a guarantee thanks to the five 
phases described below: 

- Phase 0. When a process p is in Phase 0, it starts a computation of XVL, sets 
A^f.Requestp to In if Alf.Requestj, = Wait (i.e., if p requests the (CS), then the 
protocol takes this request into account), and finally switches to Phase 1. 

- Phase 1. When a process p is in Phase 1, p waits the termination of XVL to know 
(1) the ID of each of its neighbors q (ID-Tabp[(j]) and (2) the leader of the system 
(XP^C.minlDp), i.e., the process with the smallest ID. Then, p starts a PIF of the 
message ASK to know which is the process authorized to access the (CS) and switches 
to Phase 2. Upon receiving a message ASK from p. any process q answers YES if Value^ 
is equal to the channel number of p at NO otherwise. Of course, p will only take the 
answer of the leader into account. 

- Phase 2. When a process p is in Phase 2, it waits the termination of the PIF started 
in Phase 1. After VTT terminates, the answers of any neighbors q oi p are stored in 
PrivilegeSp[q'] and, so, p knows if it is authorized to access the (CS). Actually, p is 
authorized to access the (CS) (see Winner{p)) if: (1) p is the leader and ValuGp = 
or (2) the leader answers YES to p. If p has the authorization to access the (CS), 
p starts a PIF of the message EXIT. The goal of this message is to force all other 
processes to restart to Phase 0. This ensures no other process executes the (CS) until 
p notifies to the leader that it releases the (CS). Indeed, due to the arbitrary initial 
configuration, some process q p may believe that it is authorized to execute the (CS) : 
if q never starts Phase 0. On the contrary, after restarting to 0, q cannot receive any 
authorization from the leader until p notifies to the leader that it releases the (CS). 
Finally, p terminates Phase 2 by switching to Phase 3. 

- Phase 3. When a process p is in Phase 3, it waits the termination of the last PIF. 
After VI J- terminates, if p is authorized to execute the (CS), then: p executes the 
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(CS) if Alf .Request^ = In {i.e., if the system took a request of p into account) and 
then either (1) p is the leader and switches Value^ from to 1 or (2) p is not the 
leader and starts a PIF of the message EXITCS to notify to the leader that it releases 
the (CS). Upon receiving such a message, the leader increments its variable Value 
modulus n + 1 to authorize another process to access the (CS). Finally, p terminates 
Phase 3 by switching to Phase 4. 

- Pheise 4. When a process p is in Phase 4, it waits the termination of the last PIF 
and then switches to Phase 0. 

Proof of Snap-Stabilization. We begin the proof of snap-stabilization of Protocol A4£ 
by showing that, despite the arbitrary initial configuration, any execution of A4£ always 
satisfies the correctness property of Specification 3. 

Assume that a process p requests the (CS), i.e., A^f^.Request^ = Wait. Then, p cannot 
enters in the (CS) before executing Action Aq, indeed: 

- p enters in the (CS) only if Alf .Request^ = In, and 

- Action Ao is the only action of ^A£ allowing p to set A^f.Request^ to In. 

Hence, to show the correctness property of Specification 3 (Corollary 1), we have just to 
prove that, despite the initial configuration, after p executes Action Aq, if p enters in the 
(CS), then it executes the (CS) alone (Lemma 9). 

Lemma 7 Let p be a process. Starting from any configuration, after p executes Aq, if p 
enters in the (CS), then every other process has switches to Phase at least once. 

Proof. By checking all the actions of Algorithm 3, we can remark that after p executes Aq, 
p must execute the four actions Aq, Ai, A2, and A3 successively to enter in the (CS) (in A3). 
Also, to execute the (CS) in Action A3, p must satisfy the predicate Winner (p). The value 
of the predicate Winner (p) only depends on (1) the XV C computation started in Aq and 
(2) the PIF of the message ASK started in Ai. Now, this two computations are done when p 
executes A2. So, the fact that p satisfies W inner (p) when executing A3 implies that p also 
satisfies Winner (p) when executing A2. As a consequence, p starts a PIF of the message 
EXIT in A2. Now, p executes A3 only after this PIF terminates. Hence, p executes A3 only 
after every other process executes Ag {i.e., the feedback of the message EXIT): by this action, 
every other process switches to Phase 0. □ 

Definition 6 (Leader) We call Leader the process of the system with the smallest ID. In 
the following, this process will be denoted by C 

Definition 7 (Favour) We say that the process p favours the process q if and only if {p = 
q A Valuep = 0) V (p 7^ g A ValuSp = q) . 
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Algorithm 3 Protocol MS for any process p 



Constant: 

n : integer, number of processes 

IDp integer, identity of p 

Variables: 

Request^ G {Wait, In, Done} 
PhasGp e {0,1,2,3,4} 
ValuGp e {0 ... n - 1} 

Privileges^, [1 ... n — 1] G {true.false}^~^ 
Predicate: 

Winner{p) = {XVC.m±nir>p=IDpAValuep=0) V C3ge[l 
Actions: 

Ao :: (Phasep = 0) — 



Ai :: (Phasep ^ 1) A (XX>£. Request — Done) — > 



A2 " (Phasep — 2) A (PX^. Request — Done) 



A3 :: (Phasep = 3) A ("PX^.Request = Done) 



inpul/oulpiit variable 
internal variable 
internal variable 
internal variable 



A4 
As 



As 
A9 
Aio 



(Phasep ^ 4) A ('PXJ^.Requestp 
receive- brd (ask) from q 



receive-brd(EXIT) from q 
receive-brd(EXITCS) from q 

receive- fck (YES) from q 
receive-fck(ND) from q 
receive-fck:(OK) from q 



■ Done) — 



n - 1], PrivilegeSj^[q]A2:P£.ID-Tabp[^]^XI?/:.minIDp) 

X2?/^.Requestp <— Wait 
if Requestp = Wait then 

Requestp In /* Start */ 
end if 

Phasep ^ Phasep + 1 

VXJ^.B-Vlesp ^ ASK 
T^XjF.Requestp ^ Wait 
Phasep ^ Phasep + 1 

if Winner{p) then 

PX^.B-Mesp <— EXIT 

PXjF.Requestp <— Wait 
end if 

Phasep *— Phasep + 1 

if 'Winner{p) then 

if (Request^ — In) then 
(CS) 

Request^ < — Done Termination */ 

end if 

if (XDC.minlDp ^ IDp) then 
Valuep 1 

else 

•PTJT.B-Mesp <- EXITCS 
"PXJF. Requestj^ < — Wait 
end if 
end if 

Phasep < — Phasep + 1 



Phaser, 







if Valuep — q then 

PX^.F-Mesp[g] YES 

else 

VXT.F--Hesp[q] ^ NO 
end if 

Phasep < — 
-PXJF.F-Mespfg] ^ GK 

if (Valuep — q) then 

Valuep *— (Valuep + 1) mod {n + 1) 
end if 

'PX:F.F-Mesp[g] ^ OK 
Privileges^ [q] * — true 
PrivilegeSp[g] false 
/* do nothing * / 
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Lemma 8 Let p be a process. Starting from any configuration, after p executes ko, p enters 
in the (CS) only if the leader favours p until p releases the (CS) . 

Proof. By checking all the actions of Algorithm 3, we can remark that after p executes 
Aq, p must execute the four actions Aq, Ai, A2, and A3 successively to enter in the (CS) (in 
A3). Moreover, p executes a complete X'D£-computation between Aq and Ai. So: 

(1) TD£.ininIDp = IDc when p executes A3 (by Theorem 3, 1T>C is snap-stabilizing for 

Specification 2). 

(2) Also, from the configuration where p executes Ai, all messages in the channels from 
and to p have been sent after IVjC starts at p in Action Aq (Property 1, page 16). 

Let us now study the two following cases: 

- p = C In this case, when p executes A3, p must satisfy Valuep = Valuer = to enter 
in the (CS) by (1). This means that C favours p (actually itself) when p enters in the 
(CS). Morever, as the execution of A3 is atomic, C favours p until p releases the (CS) 
and the lemma holds in this case. 

- p C. In this case, when p executes A3, p satisfies TVC.minlDp = IDc by (1). 
So, p executes the (CS) only if 3gr e [1 . . . n - 1] such that IVC.ID-Tabp[q] = IDc A 
PrivilegeSp[q'] = true (see Predicate Winner{p)). To that goal, p must receive a 
feedback message YES from C during the PIF of the message ASK started in Action Ai . 
Now, £ sends such a feedback to p only if Valuer = p when the "receive-brd(ASK) 
from p" event occurs at jC (see Action A5). Also, since jC satisfies Valuer = p, C 
updates Valuep only after receiving an EXITCS message from p (sec Action A7). Now, 
by (2), after C feedbacks YES to p, C receives an EXITCS message from p only if p 
broadcasts EXITCS to C after releasing the (CS) (see Action A3). Hence, £ favours p 
until p releases the (CS) and the lemma holds in this case. 

□ 

Lemma 9 Let p be a process. Starting from any configuration, if p enters in the (CS) after 
executing Aq, then it executes the (CS) alone. 

Proof. Assume, for the purpose of contradiction, that p enters in the (CS) after executing 
Aq but executes the (CS) concurrently with another process q. Then, q also executes Action 
Aq before executing the (CS) by Lemma 7. By Lemma 8, we have the two following property: 

- £ favours p during the whole period where p executes the (CS). 

- £ favours q during the whole period where q executes the (CS). 

This contradicts the fact that p and q executes the (CS) concurrently because £ always 
favours exactly one process at a time. □ 

Corollary 1 (Correctness) Starting from any configuration, if a requesting process enters 
in the (CS), then it executes the (CS) alone. 
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We now show that, despite the arbitrary initial configuration, any execution of M.£ always 
satisfies the start property of Specification 3. 

Lemma 10 Starting from any configuration, every process p switches to Phase infinitely 
often. 

Proof. Consider the two following cases: 

- 'teceive-brd(EXIT) " events occur at p infinitely often. Then, each time such an event 
occurs at p, p switches to Phase (see Ag). So, the lemma holds in this case. 

- Only a finite number of "receive-brd(EXIT) " events occurs at p. In this case, p 
eventually reaches a configuration from which it no more executes Action Ag. From 
this configuration, Phasep can only be incremented modulus 5 and depending of the 
value of Phasep, we have the following possibilities: 

- Phasep = 0. In this case, Aq is continuously enabled at p. Hence, p eventually 
sets Phasep to 1 (see Action Aq). 

- Phasep = i with i > 0. In this case. Action Aj is eventually continuously enabled 
due to the termination property oiTDC and VTT. By executing Aj, p increments 
Phasep modulus 5. 

Hence, if only a finite number of "receive-brd(EXIT)" events occurs at p, then Phasep 
is eventually incremented modulus 5 infinitely often, which proves the lemma in this 
case. 

□ 

Lemma 11 Starting from any configuration, Valuer is incremented modulus n+1 infinitely 
often. 

Proof. Assume, for the purpose of contradiction, that Valuer is eventually no more 
incremented modulus n + 1. We can then deduce that C eventually favours some process p 
forever. 

In order to prove the contradiction, we first show that (*) assuming that L favours p 
forever, only a finite number of "receive-brd(EXIT) " events occurs at p. To that goal, 
assume, for the purpose of contradiction, that an infinite number of "receive-brd(EXIT)" 
events occurs at p. Then, as the number of processes is finite, there is a process q ^ p 
that broadcasts EXIT messages infinitely often. Now, every PIF-computation terminates 
in finite time (termination property of Specification 1, page 9). So, q performs infinitely 
many PIF of the message EXIT. In order to start another PIF of the message EXIT, q must 
then successively execute Actions Ao, Ai, A2. Now, when q executes A2 after Ao and Ai, 
IVCmlmTDq = IDc and either [1) q = C and, as q^ p, Valuer ^ 0, or (2) C has feedback 
NO to the PIF of the message ASK started by q because Valuer = P ^ q- In both cases, 
q satisfies -Winner{q) and, as a consequence, does not broadcast EXIT (see Action A3). 
Hence, q eventually stops to broadcast the message EXIT — a contradiction. 
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Using Property (*), we now show the contradiction. By Lemma 10, p switches to Phase 
infinitely often. By (*), we know that p eventually stops executing Action kg. So, from 
the code of Algorithm 3, we can deduce that p eventually successively executes Actions Aq, 
Ai, A2, A3, and A4 infinitely often. Consider the first time p successively executes Aq, Ai, A2, 

A3, and A4 and study the two following cases: 

- p = C. Then, Valusp = and H?£.ininIDj, = IDp when p executes A3 because p 
executes a complete XI?£-computation between Aq and Ai and IDC is snap-stabilizing 
for Specification 2 (page 16). Hence, p updates ValuSp to 1 when executing A3 — a 
contradiction. 

- p ^ C Then, H>£.ininIDp = IDp when p executes A3 because p executes a complete 
XP/C-computation between Ao and Ai and IDC is snap-stabilizing for Specification 2 
(page 16). Also, p receives YES from C because p executes a complete PIF of the mes- 
sage ASK between Ai and A2 and VI!F is snap-stabilizing for Specification 1 (page 9). 
Hence, p satisfies the predicate Winner{p) when executing A3 and, as a consequence, 
starts a PIF of the message EXITCS in Action A3. This PIF terminates when p executes 
A4: from this point on, we have the guarantee that C has executed Action A7. Now, 
by Ay, £ increments Valuer — a contradition. 

□ 

Lemma 12 (Start) Starting from any configuration, any process that requests the (CS), 
enters in the (CS) in finite time. 

Proof. Assume, for the purpose of contradiction, that from a configuration 7, a process 
p requests but never enters in the (CS). Then, Lemma 10 implies that p eventually executes 
Aq and after executing Aq, Request^ = In holds forever (Request^, is switched to Done only 
after p releases the (CS)). From the code of Algorithm 3, we can then deduce that there is 
two possibilities after p executes Aq : 

- p no more executes A3, or 

- p satisfies -W inner {p) each time it executes A3. 
Consider then the two following cases: 

- p = C. Then, Valuep = eventually holds forever — a contradiction to Lemma 11. 

- p 7^ £. In this case, p no more starts any PIF of the message EXITCS. Now, every PIF- 
computation terminates in finite time (termination property of Specification 1, page 9). 
Hence, the "receive-brd (EXITCS) from p" event eventually no more occurs at C. As 
a consequence. Valuer eventually no more switches from value p to {p+ 1) mod {n+ 1) 
— a contradiction to Lemma 11. 

□ 

By Corollary 1 and Lemma 12, starting from any configuration, any execution M.£ always 
satisfies Specification 3. Hence, follows: 

Theorem 4 Protocol MS is snap-stabilizing from Specification 3. 
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5 Conclusion 

We addressed the problem of snap- stabilization in message-passing systems and presented 
matching negative and positive results. On the negative side, we show that snap- stabilization 
is impossible for a wide class of specifications — namely, the safety- distributed specifications 
— in message-passing systems where the channel capacity is finite yet unbounded. On 
the positive side, we show that snap- stabilization is possible (even for safety-distributed 
specifications) in message-passing systems if we assume a bound on the channel capacity. The 
proof is constructive, as we presented the first three snap-stabilizing protocols for message- 
passing systems with a bounded channel capacity. These protocols respectively solve the 
PIF, IDs-Learning, and mutual exclusion problem in a fully-connected network. 

On the theoretical side, it is worth investigating if the results presented in this paper 
could be extended to more general networks, e.g. with general topologies, and/or where 
nodes are subject to permanent aka crash failures. On the practical side, our result implies 
the possibility of implementing snap-stabilizing protocols on real networks, and actually 
implementing them is a future challenge. 
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